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Abstract. I consider the task of so called curve fitting or experimental data fit- 
ting commonly encountered in various branches of scientific research. Unlike the 
traditional approach I do not try to minimize any functional based on available 
experimental information, instead the minimization problem is replaced with 
constraint satisfaction procedure, which produces the interval hull of solutions 
of desired type. The method, called 60a: slicing algorithm, is described in de- 
tails. The results obtained this way need not to be labeled with confidence level 
of any kind, they are simply certain (guaranteed). Additionally, the memory 
requirements for the presented method are very conservative. The approach is 
directly applicable to other experimental data processing problems like outliers 
detection and finding the straight line, which is tangent to the experimental 
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1 Stating the problem 



In many branches of science the so called problem of experimental data fitting 
is often encountered. Its short description is following: 

Given the data set, i.e. the set of experimental observations, called 
also measurements, {(ccj, yj)}"^i, and the model, find appropriate 
parameters of this model, which adequately describe the data. 

In the following I will assume that: 

• the values of both coordinates for each measurement may be uncertain, 
i.e. for each a^j (respectively y^) we know the interval, [x^ , Xj^ — (resp. 

y_., Ijj = Yj), containing Xj (resp. yj) and guaranteed to contain the 
true, unknown value of the measured or controlled physical quantity, 



• we are searching for parameters of a linear model relating x with y: 



y — ax + b 



The further considerations are directly applicable to many other two-parameter 
models. The extension for even more complicated cases is also straightforward. 
I choose the linear model since it is very important, widely used and, at the 
same time, probably the simplest one. 

Shortly: the goal is to find the bounds for two parameters, called hereafter a and 
6, which describe reliably the experimental data. There are many procedures 
to solve such a problem, all depending on the exact meaning what is the best 
solution, the LSQ (least squares method) and LAD (least absolute deviations) 
being among the most popular. Yet, even the interval counterparts of those 
methods do not deliver the results expected by experimentalists. Often encoun- 
tered is the "cluster problem" (see and 1^), which makes the precise location 
of a global extremum difficult. Additionally, as a consequence of clustering, the 
enclosures for physically interesting parameters are usually very pessimistic, up 
to the point of complete unusability. So why bother at all with one more interval 
method? 



2 Deficiencies of existing methods 

Most popular and commonly used fitting methods are nowadays the ones based 
on probabilistic grounds. This is because the results of measurements are treated 
as random variables. There is nothing wrong with such an assumption, however 
further treatment of the experimental data is most often than not based on, 
rarely explicitly stated, additional assumptions concerning the distributions of 
measured quantities. It is assumed, and almost never checked, that the distribu- 
tions are normal (Gaussian). However, contrary to common belief, they usually 
aren't normal. Today the vast majority of measurements is performed with 
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digital measuring devices, so even if the investigated phenomenon is normally 
distributed, then the set of its measurements, consisting of discrete values only, 
cannot be normally distributed. 

There is also one more hypothesis being used, namely that the uncertainties 
(errors) are small. This is never checked, and indeed cannot be checked, since 
there is no possibility to influence the uncertainties of measurements once they 
had been performed. 

Finally, all such methods, explicitly or implicitly, make use of the Central Limit 
Theorem, without ever bothering, that the conclusions drawn on those grounds 
are only asymptotically valid, in the limit of infinite number of measurements. 

The estimates of interesting parameters, obtained this way, are given as a pair 
of numbers meaning: either the most probable value and its standard deviation 
(again silently assuming the normal distribution!) or — less often — as the 
confidence interval. The choice of the so called confidence level, which is then 
a third number, remains arbitrary. Needless to say, that the confidence level is 
only very loosely related, if at all, to the performed measurements. 



3 Interval point of view 

In this paper we are going to find the tight and guaranteed bounds for both 
parameters a and b. According to this aim, we will search for the intervals, 
a = [a, a] and b = [b, b\, containing with certainty the true values of a and b 
respectively. This is equivalent with finding the solutions of the following system 
of equations: 

{axi + b = yi 
: ; : d) 
ax„ + b = y„ 

It is a system of linear equations with n > 2 equations and only 2 unknowns. 
Since the number of available data exceeds the number of unknowns, then the 
system is overdetermined and therefore generally has no solutions in the 
usual sense. Nevertheless, we will find such intervals a and b, that the equations 
and the experimental data will be in some sense consistent. According to 
Shary |^ , there are many ways of saying what kinds of solutions are of interest 
to us. To count them all let us rewrite the system (0) in standard matrix form: 
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To define a particular set of solutions, we have to assign one of two available 
quantifiers, V or 3, to each entry of the matrix of coefficients of equations and to 
each component of the right hand side vector (for details see [0] and the following 
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discussion). This makes 2^" x 2" = 2^'^ possible assignments corresponding to 
this many various solutions sets. Fortunately, this rather huge number can 
be substantially reduced. First, the second column of the matrix of coefficients 
consists of simple real numbers only, not intervals. There is no point in assigning 
any quantifier to the numbers originating from the degenerate interval [1, 1], 
since there is exactly only one number contained in it. Secondly, we do not want 
to distinguish any particular measurement from among others. This means, that 
the quantifiers have to be assigned in a special way: all x's should be connected 
with the same quantifier. The same can be said about y's. Taking all this into 
account we arrive with only 4 kinds of solutions of the system (^. They are 
following: 

solutions in the usual sense: 

{(a, 6) : Vfe=i...„ V^exfc Vygy^ ax + b ^ y} (3) 

This set is unbounded when there is only one experimental point, exact or 
not. When the number of measurements is equal 2 and the measurements 
are exact (and different, i.e. xi n X2 = 0), then it reduces to the single 
point. For two inexact measurements it is bounded, when the measure- 
ments are disjoint, and unbounded otherwise. It is usually empty, when 
we have 3 or more measurements, since it is impossible to draw a straight 
line connecting three or more arbitrarily chosen points, each belonging to 
its own rectangle x y^, even if each of those rectangles consists of only 
one point. The solutions in usual sense are then of no interest for the 
experimentalists except, perhaps, the case n = 2, which is described later. 
For now let's assume that n > 2. 

united solutions: 

{(a, 6) : Vfc=i...„ 3a;gxfc 3,ygy^ ax + b^y} (4) 

This set, if non-empty, consists by definition of all straight lines having at 
least one common point with each of the rectangles xi x yi , X2 x y2 , ... 
Xn X y„. In set theory language the above may be expressed as 

Vfc=i...„ (axfc +b) nyfe ^ (5) 

Of course, not every pair (a, b) G (a, b) is the member of the set of solu- 
tions. Indeed, whenever we say "solutions" , then we really mean "interval 
hull of the solution set" . The same comment applies to the two remaining 
cases described below. 

controllable solutions: 

{{a, b) : yk=i...n^xexk^yeyk ax + b = y} (6) 

Treating intervals as sets, we can write the relation satisfied by interval 
hull of solutions of this kind as 

Vfc=i...n axfe -f b D yfc (7) 
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Figure 1: Geometric interpretation of the controllable solution. The straight 
line shown belongs to the set of controllable solutions, since it passes through all, 
arbitrarily chosen, short horizontal sections inside each experimental rectangle. 
It does not cross any of the vertical edges of any rectangle. 



Finally, we should consider 
tolerable solutions: 

{{a, b) : Vfe=i...„ V^jexfc 3ygy^ ax + b = y} (8) 
which are contained in the box (a, b), for which the relation 

Vfc=i ... n axfc + b C yfe (9) 

holds. 

All the above relations should hold for every measurement; the names of various 
sets of solutions are used after Shary |3| . 

In the strictly mathematical sense we have covered all kinds of solutions. Ex- 
perimentalists may be interested in one more type of "solutions" , which will be 
called here crude solutions. These are defined in almost the same way as the 
united solutions, except that the relations (^) are required to hold for majority 
of measurements, not necessarily for all of them. This way every united solution 
is, of course, a crude solution as well. Such "solutions" might be useful, when 
analyzing data containing outliers. Let us refrain from further discussion of 
crude solutions now, leaving it to the later part of this paper. Instead consider 
still another set, defined as the smallest one satisfying the conditions 

Vfe=i...„ V^gxfc ax + b D yfc (10) 
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Figure 2: Geometric interpretation of the tolerable solution. The straight 
hne shown belongs to the set of tolerable solutions, since it passes through all, 
arbitrarily chosen, short vertical sections inside each experimental rectangle. It 
does not cross any horizontal edge of any rectangle. 



which are very similar to those for the interval hull of controllable solutions. 
Simply stating, the set just defined has the intuitively simple property, that 
the graph of the expression y = aa; + b covers all experimental uncertainty 



rectangles in the xy plane. I shall show now, that the set (10) is unbounded in 

Suppose that we have fixed parameter a = [a, a] G IM as a thin (degenerate) 
interval being equal to the arbitrary real number. It is easy to adjust the 
parameter b = [6, 6] G IM such that (|lO|) holds; it is enough to put 



b < mm [y^ - ax^ ] (11) 

and 



b > max l^yj, - axfc j (12) 

The smallest b corresponds, of course, to the situation, when we have equalities 
in (^Tj) and (|l^. But, since a was arbitrary, then the entire set is unbounded 
in M^, and so must be its convex (interval) hull. Thus we have shown, that the 
smallest set of type (|^, in the sense that it is contained in any other having 
required properties, does not exist. 

We will not explore the idea of finding the box with minimal volume, since this 
might lead to unplausible or even unphysical results. Consider the poor quality 
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experimental data, (for sake of simplicity we assume that both variables are 
dimensionless, so the inequality below makes sense) for which 

width Xfe^ < width yfc j (13) 

and y denotes the convex hull. For such data it may easily happen, that the 
"fitted" line will be perpendicular to the expected direction in the xy plane! 

In conclusion, wc will not discuss further the solutions of type (p^. There is no 
hope, that they would constitute good starting point for subsequent refinement 
to other types of solutions, more tight. It may even happen, that the randomly 
selected set of "solutions" with property ( p!o| ) has no common elements with the 
united solutions set. 



4 Properties and usability of remaining types of 
solutions 

First note that the following inclusions are always true: 

tolerable solutions C united solutions (14) 

and 

controllable solutions C united solutions (15) 

This means, that the set of united solutions is the "largest" one among all of 
the true solutions (wc exclude the crude solutions, since they are not true in the 
sense, that they do not hold for every experimental point). 

The natural question arises, which kind of solutions (and why) is the one, which 
should be used by experimentalists? The quick answer is they all deserve our 
attention. Looking at Fig.|| we can see that the sets of tolerable and/or con- 
trollable solutions deliver potentially better, i.e. more accurate (tight) and thus 
tempting estimates of unknown parameters. Yet, for the experimentalist they 
are unreliable. Suppose that after some time the new experimental result is 
available, obtained with much better accuracy. It may place itself within the 
one of the already known uncertainty rectangles but outside the domain marked 
with letter T (or C) in Fig.|^. It is obvious, that in such circumstances the 
tolerable (or controllable) solution set will be empty. No definite conclusion 
can be drawn whether or not our knowledge had really increased as a result of 
this new measurement. Driven by the widely accepted paradigm that increase 
of knowledge is nothing else as decrease of ignorance, (and this should never 
increase after new, correctly performed measurement) we would rather prefer 
the united solutions' set. Indeed, the choice between various kinds of solutions 
is rather limited, since usually either tolerable or controllable solutions will be 
available, if any, but not both. It is quite obvious, that significant progress, 
i.e. better bounds for unknown parameters, can only be achieved when new 
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Figure 3: United (U) vs. tolerable (T) and controllable (C) solutions for two 
experimental points. Uncertain data arc shown as light grey rectangles. Solid 
lines bound regions, marked with appropriate letters, where the straight lines 
belonging to each solution set can be drawn. For united solutions only the dark 
shadowed region is inaccessible. 



measurements, of similar accuracy, are taken off the already investigated range 
of controlled parameter x. Repeating measurements, within already explored 
range, is unlikely to improve significantly the bounds of searched parameters, 
unless they are definitely more accurate. 

Summarizing the above considerations I propose to interpret various sets of 
solutions in a following uniform way: 

• if united solutions exist, then the available data arc in agreement with 
the model in use; there is no apparent contradiction between theory and 
data. The wording "fair" , "satisfactory" . "good" or "excellent agreement" 
is a matter of taste rather than anything else. Using of such phrases may 
be only justified by comparison with similar results, especially concerning 
the widths of intervals a and b, obtained by different method(s) or by 
other authors. 

• if no united solution exist (two other sets of solutions are then empty 
too) then either of the following happened: 

— one or more points are unreliable, i.e. their error bounds are under- 
estimated, perhaps even all of them, or 

— one or more points are outliers. This may be the result of mal- 
functioning apparatus, errors in data transmission, or simply human 
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mistake when writing down the instrument's readings or typing them 
into computer. 

— the linear model is not applicable to the phenomenon under investi- 
gation. 

The latter may happen to be true quite easily in physical sciences, where 
the approximate, linearized models are used frequently. They are usable 
only as long as newer and better, more accurate, results become available. 
Then it might be the time to revise, correct or even reject the current 
theory. 

We can also see the united solution set from other perspective: if it is 
empty, then we have a proof, that our model is not adequate for the ob- 
served data. Inadequacy of the model may be bad news, but, on the other 
hand, the proof of this fact is much more valuable than the result of any 
statistical test. For this to be true, we have to be sure, that the uncer- 
tainties of all our data were estimated correctly, never underestimated. 

Few words should be said concerning two other types of solutions. The exis- 
tence of either tolerable or controllable solutions, in addition to the united set 
of solutions, has no peculiar meaning, at least for experimentalists. The lack of 
tolerable solutions may suggest that the uncertainties connected with variable 
X, which is usually under control, are either overestimated or should be reduced 
in further experiments, perhaps with better apparatus. The tolerable solutions 
may only exist, when those uncertainties are small enough. When all x's are 
exact, then tolerable solutions are simply united solutions. Similarly, the con- 
trollable solutions exist only when the tolerances for all y's are tight enough. 
Again, when those tolerances are equal to zero, then the controllable solutions 
are simply united solutions. So, the existence or lack of existence of tolerable 
or controllable solutions may be regarded only as a hint of how well the mea- 
surements were performed. As the final outcome of an experiment we should, 
however, use only the intervals a and b derived from a united solution set, no 
matter that the solutions of other kind usually look better. 

5 The algorithm 

In this section I present the general strategy of finding the interval enclosure of 
every kind of solutions of (^. It may be regarded as a functional counterpart 
of the interval functions ZEROl and ZER02 defined by van Emden in Q. The 
method is called box slicing or box peeling algorithm. 

The first step is to convert (|^) into equivalent set of conditions, usually in a 
form of inequalities. We also need to determine an initial box V, containing all 
the potential solutions of desired type. Let's defer the discussion on the exact 
forms of those conditions and the choices of initial box for later. Our goal is 
to find the smallest interval box (a, b) G M^, containing all pairs (a, b), for 
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which appropriate conditions are satisfied. The general outline of the algorithm 
is following: 



Input 




Initial box V € IK^ containing all solutions. 




Algorithm 




For each unknown interval in turn do the following: 




• try to slice the box V fi-om the left; replace V with new 
if slicing was successful 


box. 


• try to slice the box V from the right; replace V with new 
if slicing was successful 


box. 


If any slicing, in any unknown, was successful, then repeat the proce- 
dure. 


Output 




Tight interval hull (a, b) for parameters a and b. 





5.1 What is slicing? 

Suppose that the current box is = (pi, p2, ... Pr) & IK'" and we are cur- 
rently working with parameter p^,. Slicing from the left is described as a se- 
quence of steps: 

1. e - 1 

2. e ^ 

3. divide V into two parts, by cutting it with the plane pk — p^, where 

= p^ + ^ ^Pfc — p^. Call the newly created subboxes slice {pk < p^) 
and rest {pk > ) • 

4. probing 0| slice means determining, whether or not the subbox slice fails 
the considered system of inequalities. 

If slice fails the system of inequalities then 

• V ^ rest (discard slice) 

• finish slicing from the left with parameter p^; exit with fiag success 
else (probe thinner slice) 
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• if termination criteria are not met then goto g else finish slicing from 
the left with parameter p^; exit with flag no success 

Slicing from the right is similar, except that at the beginning ^ is set to and 
the later updates have the form: ^ ^ (1 + 0/2- 

Few comments are in order. As seen from the description, at the beginning the 
algorithm probes large chunks of the initial box V. Indeed, the first slice has 
the same volume as the remaining part of box V, while the subsequent slices are 
smaller and smaller. We stop slicing in the direction of the current parameter 
at first success and then immediately switch to the next parameter, according 
to van Emden's suggestions 

What is the termination criterion? The most obvious should be the one based on 
width of a slice. Unsuccessful slicing should stop, at the latest, when the width of 
a slice, in direction of currently processed parameter, becomes small, comparable 
with the machine accuracy. One might think, that we should terminate slicing 
in a given direction even earlier, at some predefined threshold e: slicing ceases, 
when ^ < e (slicing from the left) or 1 — ^ < £ (slicing from the right), where s 
is small, arbitrarily chosen, positive number, usually in range 10~^ — 10"'^. This 
choice, however, does not guarantee obtaining the tightest possible bounds for 
searched parameters. Nevertheless, it may be practical in terms of of CPU time, 
and quite sufficient when processing experimental data. 

Concluding, we may estimate the temporal complexity of a single cycle of slicing 
all unknowns, in the worst CcLSG, clS being proportional to m — the number 
of unknowns, and to n — the number of experimental points: Ct — 2Kmn; 
effectively Ct ~ 0{n), since m = 2 is fixed. The factor 2 comes from the fact, 
that slicing is always two-sided. K, the proportionality constant, is roughly 
equal to the number of bits in mantissa plus twice the largest exponent used 
in floating point representation of real numbers. One must remember, however, 
that a single iteration, involving all unknowns in turn, only rarely will suffice. 
Fortunately, in the linear case considered here, all necessary intervals, even if 
calculated in a natural way ("naive"), have sharp ends, see Hansen's § theorems 
on sharpness. That is why the computed interval enclosures are tight, also in 
the case of multivariate linear or linearized models. The above statement is not 
necessarily true, when the model is nonlinear. 

The spatial complexity of box slicing algorithm is very attractive. At any stage 
of the calculations we are working with at most 3 interval boxes (original, slice 
and rest), each of size proportional to the number of unknowns (m = 2 = const), 
so Cs - 0(1). 

5.2 Probing 

The purpose of probing is to determine whether the given box contains the 
points with required properties, in our case - the solutions. The boxes, which 
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Figure 4: The bounding box denotes initial search domain in the plane ah. 
The target, slightly oversized, is a rectangle located near the center of hgure. 
Two remaining lines, solid and dashed, divide the initial box into parts, for 
which some conditions are, or are not, met. The region bounded by dashed line 
is labelled as I, the region bounded by solid line has the label II. The details 
concerning the remaining domains III — VII are given in text. 

certainly do not contain at least one interesting point, are eliminated from 
further considerations. 

Proving the existence of solutions within the probed domain is, generally, not 
straightforward. Therefore, during probing, we will rather seek every opportu- 
nity to discard the (sub)box under study. The tests ("questions") we are going 
to use during probing have to be carefully selected, since 

. . . probing has a logic of its own. 

M.H. van Emden in 

Suppose that p < q and we have obtained for some system of inequalities X: 

• X is non-failed for x < q and 

• Z is non-failed for x > p. 

Can we say anything about the localization of the solutions of I, especially 
within the interval [p, q]7 No, but on the other hand the result: 
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(X is failed for x < q) and (X is failed for x > p) 

is a proof, that 2 has no solutions at all, while 

(Z is failed for x > q) and (X is failed for x < p) 

implies, that the solutions, if any, must be located within the interval [p, q] 
(there are no solutions outside this interval). 

Consider the Figj^. Think of the situation shown there as applicable to just 
one experimental point, say first, and to all the inequalities, in which this point 
is explicitly involved. For sake of simplicity from now on we drop the index 
numbering experimental points. Let's concentrate first on the united solution 
set. We are going to find such a and b that 

(ax + b)ny^0 (16) 

for any pair (x, y) G (x, y). The opposite condition, when intervals ax + b 
and y are disjoint, is of better value for our purposes. It may be written in 
conventional interval notation as 



(ax + b < y) V ( ax + b > y) (17) 

The alternative (p^, written above, provides us with the correct answer to 
the question whether the condition ( p^ ) is failed. Boxes (a, b), for which ( p7| ) 
is true, can be safely discarded from further considerations, since all points 
belonging to them violate ( p^ ) . All such boxes are localized outside the regions 
marked as I and II in Fig.^ i.e. they may be found in regions III, IV or VII. So, the 
alternative ( p^ ) may be used as a rejection criterion by box slicing algorithm. 
Think, however, what will happen, if the united set of solutions is empty? In 
such case we will finish with very small box, still not sure whether or not there 
are any solutions in it. In contrast to ordinary point calculations, we cannot 
check it with a single calculation — there is still uncountable number of points 
belonging to such a box. 

The solution of this dilemma is quite simple. Starting with initial box V we 
discard those its parts, which satisfy the first term of the alternative (1^), ob- 
taining in result the box Vd C V. The box Vd covers the region I in Fig. 5^ Then 
we repeat the procedure, again starting with V , but this time using the second 
term of (^^ as a rejection criterion. Now the resulting box is Vu C V, covering 
the domain II in Fig.^. The solutions, if any, must be located in the intersec- 
tion Vd Ci Vu- This intersection, if not empty, becomes the new initial box for 
another iteration. Continuing this process we obtain better and better interval 
hulls of regions marked as V and VI in Figj^. Procedure terminates ("eventually 
stabilizes" in language of ||]), when = Vd = V, or, in other words, when the 
slicing becomes idempotent operation. Empty intersection of Vu and V^, at any 
stage of calculations, constitutes a proof that the set of solutions is empty. 

So, asking the proper questions during probing is not a trivial thing. The main 
difficulty lies in construction of appropriate rejection tests. Only the tests Q 
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with property {V C W £ IW): 



Q{V) is non-failedj => \Q{W) is non-failedj (18) 

are appropriate. This is because we usuaUy start with severely overestimated 
initial box known to contain solutions. We don't want to have it rejected as a 
whole, in result of the first applied test. 

5.3 Rejection tests and search regions for various kinds of 
solutions 

5.3.1 United solutions 

In case of just two different measurements (m = 2, xi n X2 = 0) the problem 
may be quickly solved "analytically" : 

a = Yl^ll (19) 

X2 - Xl 

b = (yi - axi) n (ya - axz) (20) 

Author cannot resist the temptation to comment on the elegance of the expres- 
sion (^ ) - both measurements are treated in a perfectly symmetric manner. 
Despite this elegance, the expression for b does not necessarily produce the 
tight interval enclosure for 6, while the enclosure ( [l9| ) for a is tight (sharp in 
Hansen's terminology, see [^). This is due to the well known dependency prob- 
lem, which is unavoidable here; namely b is expressed by, among other, a, xi 
and yi, while a was already calculated using the same variables. The lack of 
tightness is not a real problem here, we can rectify it using the box slicing algo- 
rithm, which, by its construction, always produces tight interval enclosures in 
linear case. 

The construction given by formulae (|l^) and (|2^) can be used for establishing 
the bounds for the initial box V , in which we will search for solutions (of any 
kind) of the original linear problem. It is sufficient to set 

y=(a, b) = |J (ajfe, bjfe) (21) 

where a.jk and hjk are intervals obtained using values from measurements j and 
k. The convex hull written above covers results for all pairs (j, k) of experimental 
data satisfying Xj flXfc = 0. Calculation of ( ^l|) has thus the complexity O (ii^), 
what may seem excessive, especially when the number of measurements is large. 
Instead we may start with initial box defined as V = ([— , 
where w is some sufficiently large number, say 10*", at a price of increased 
number of iterations later. 

In summary: in order to find united solutions we need the initial box given 
by (21) and a pair of rejection rules given in (p^). 
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5.3.2 Tolerable solutions 



Those, if exist, are a subset of united solutions. For this reason the initial box 
might be constructed the same way as for united solutions, see (|2l|). What 
we need are the rejection rules appropriate for this case. Rules ( [l7[ ) are not 
sufficient. We need to discard boxes having no common parts with any tolerable 
solution, not just with the experimental points. Then we have to find at least one 
tolerable solution. Unfortunately, this is bad idea. Driven by such a condition, 
the procedure will converge not to the convex hull of tolerable solutions, but to 
this single, specific solution instead. Besides, we still don't know how to find 
the first tolerable solution. 

Let us start with calculating 



ytoi ^ ^^toi^ = I fl a,,, fl b,, I (22) 

jk jk 



what can be done in the same loop, in which ( |21| ) is calculated. The box 
ytoi ^ j£ j-^Q^ empty, contains all straight lines having common points with each 
experimental rectangle. If it is empty, then the tolerable solutions cannot exist. 
Let's also calculate the auxiliary intervals, one for each measurement 

y;, - a*°'xfe + b*°' (23) 

Now the rejection rules may be expressed as follows. Reject box (a, b), if for 
at least one measurement, say k, 

(axfc + bfc) n y^. - (24) 

When calculating reject also boxes satisfying the condition 

(axfe +b < yfc) V (aSc^ + b < yfc) (25) 
for at least one measurement, while for Vu use as a rejection criterion 

( axfc +b > y^) V (ax£Vb > y>) (26) 

5.3.3 Controllable solutions 

Controllable solutions, if exist, are subset of united solutions. So, the starting 
box can be constructed using ( pl| ) again. Following the prescription ( p^ ) we 
construct 

ycon ^ ^^con^ ^co„) ^27) 

and then the auxiliary intervals 

y'k = a™"xfe + b™" (28) 
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If V°°" = then the controllable solutions cannot exists. We discard all boxes 
(a, b) satisfying ( |2^ ) for at least one measurement. Additionally, when dealing 
with Vd we use 

min (^axk + b, ax^ + b ) > yfe (29) 

and when slicing Vu the following condition 

max (axfc + b, a5c^ + b) < y% (30) 

as a rejection criterion. It happens quite often that initially ^ 0^ but the 
set of controllable solution is empty anyway. 

6 An Example 

Using a FORTRAN program, described in Q, I have fitted an artificial data set 
consisting of 10 points, with uncertainties in both variables. As the uncertainties 
for each experimental measurement, ax and <Ty, I have used third parts of radii 
of the corresponding intervals, i.e. ^th of their widths. The data are listed in 
Table |l| and the results in Table ||. 



Table 1: Data used in exemplary calculations, ax and ay were taken as rounded 
third parts of the corresponding radii. 



center of x radius of x 


<7x 


center of y 


radius of y 


ay 


0.9 


0.1 


0.333 


3.65 


0.45 


0.150 


1.9 


0.1 


0.333 


4.60 


0.40 


0.133 


2.9 


0.1 


0.333 


5.65 


0.22 


0.073 


3.9 


0.1 


0.333 


6.60 


0.40 


0.133 


5.4 


0.1 


0.333 


8.00 


0.50 


0.167 


5.9 


0.1 


0.333 


8.55 


0.35 


0.117 


6.9 


0.1 


0.333 


9.60 


0.50 


0.167 


8.7 


0.1 


0.333 


11.30 


0.50 


0.167 


9.1 


0.1 


0.333 


12.75 


0.55 


0.183 


10.1 


0.1 


0.333 


13.70 


0.30 


01.00 



Table 2: Results produced by program taken from as recorded from com- 
puter screen, without any rounding. 



parameter 


value 


parameter 


value 


O-LSQ 


1.08530271 




0.0136506381 


bLSQ 


2.43730211 




0.0823259652 



I have also obtained an interval hull of united solution set for this problem, 
using full uncertainties for both variables. The result is (rounded outwards to 
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five significant figures): 

(a, b) = ([1.02270, 1.13159], [2.06840, 2.96827]) 

As expected, ulsq & a and ^lsq & b. Using those values, the Table || was 
then prepared. In order to calculate the "corridor of errors" {vlsq — 3(T and 
Ulsq + 3fT in Table) for LSQ results, I have mechanically adopted the widely 
used formula, taking as a the following quantity: 



= ■^|.T<T„|' + |ab|' (31) 



Table 3: Results of exemplary calculations. The intervals were rounded out- 
wards to two signiGcant figures, while the numbers labeled as ulsq i 3ct, re- 
sulting from point calculations, were rounded conventionally. The last column 
contains the ratio of widths of corresponding uncertainty estimates: interval to 
that produced by Least SQuares method. First and last row do not correspond 
to any measurement, they are included to illustrate the extrapolation behavior 
of both approaches. 



X 


y 


y 




Vf^t 


VLSQ - 3o- 


VLSQ + 3(7 


WINT /WLSQ 


0.0 






2.06 


2.97 


2.434 


2.441 


120 


0.8 


3.20 




2.88 


3.88 


3.105 


3.506 


2.494 


1.0 




4.10 


3.09 


4.10 


3.272 


3.773 


2.016 


1.8 


4.20 




3.90 


5.01 


3.943 


4.839 


1.239 


2.0 




5.00 


4.11 


5.24 


4.111 


5.105 


1.137 


2.8 


5.43 




4.93 


6.14 


4.781 


6.171 


0.871 


3.0 




5.87 


5.13 


6.37 


4.949 


6.438 


0.833 


3.8 


6.20 




5.95 


7.27 


5.620 


7.503 


0.701 


4.0 




7.00 


6.15 


7.50 


5.787 


7.770 


0.681 


5.3 


7.50 




7.48 


8.97 


6.877 


9.502 


0.568 


5.5 




8.50 


7.69 


9.20 


7.045 


9.768 


0.555 


5.8 


8.20 




8.00 


9.54 


7.296 


10.168 


0.536 


6.0 




8.90 


8.20 


9.76 


7.464 


10.434 


0.525 


6.8 


9.10 




9.02 


10.67 


8.135 


11.500 


0.490 


7.0 




10.10 


9.22 


10.89 


8.302 


11.767 


0.482 


8.6 


10.80 




10.86 


12.70 


9.644 


13.898 


0.433 


8.8 




11.80 


11.06 


12.93 


9.811 


14.165 


0.429 


9.0 


12.20 




11.27 


13.16 


9.979 


14.431 


0.425 


9.2 




13.30 


11.47 


13.38 


10.147 


14.698 


0.420 


10.0 


13.40 




12.29 


14.29 


10.817 


15.763 


0.404 


10.2 




14.00 


12.50 


14.52 


10.985 


16.030 


0.400 


20.0 






22.52 


25.60 


19.200 


29.086 


0.312 



As can be seen from the Table within the range, where the experimental 
points were taken, the results are comparable. However, if one examines the 





dy 




dy 




oa 


+ 
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behavior of both methods outside this domain, then the predictive power of 
the interval method is clearly superior. The LSQ uncertainty estimate for y at 
X = seems at least unrealistic, while for a; = 20 it is more than three times 
wider comparing with the extrapolation based on interval result. 

It should be noted that the problem of interpolation of interval valued experi- 
mental data has already been investigated in Q (Lagrange interpolating poly- 
nomials) and in Q (essentially the decomposition into set of basis functions — 
generalized polynomials). In both cases the uncertainty appeared only in one 
(dependent) variable. No conclusions concerning physical meaning or validity 
of obtained parameters could be drawn. 

7 Other applications in experimental sciences 
7.1 Detection of outliers 

Suppose, that we are trying to find the convex hull of united solutions for some 
linear problem and it appears empty. This means, that one or more data points 
are outliers. There are several methods, heuristic as well as based on probability 
calculus, which make possible the identification of outliers. Note, that contrary 
to the interval methods, the LSQ method always produces some estimates of 
unknown parameters, regardless of the presence of outliers. Outliers can be most 
easily spotted when the data and the best fitted line are plotted simultaneously 
on the same graph. This is rather tedious task, if performed manually. In 
control applications, either industrial or in laboratory, when the environment is 
noisy, the misreadings or data transmission errors may occur quite frequently 
and go undetected. This may seriously affect the quality and reliability of the 
control procedures, sometimes leading to disastreous effects or even fatalities. 

The advantage of interval methods over traditional LSQ method is then evident: 
no outlier can go unnoticed. Nevertheless, the question of its identification still 
remains. We should mention here, that other methods usually fail, when the 
data set contains more than a single outlier, or there are just two outliers located 
one next to the other. 

Proposed is the following procedure for outlier detection: repeat finding the 
united solution set for the experimental data using relaxed conditions (|^), i.e. 
with dropped requirement, that the condition is true for every measurement. In 
other words, we will search for crude solutions mentioned earlier. In order to 
find this set we will discard boxes (a, b), which fail condition (||) for at least 
k measurements, where index fc, initially set to zero, numbers the consecutive 
trials. The value of index fc, at which we succeed for the first time, tells us 
whether there are any outliers in the investigated data set and, eventually, how 
many of them are there. Their identification is immediate: they all do not 
satisfy the condition (^) , evaluated with most recently obtained intervals a and 
b. Note, that there is no need for a priori knowledge, which measurements are 
suspected. 
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The method, outhned above, should be very robust it should be able to detect 
and identify quite large number of outliers, even if they constitute up to 50% 
of all measurements. I have performed only a very limited number of tests, 
with only one or two outliers. The tests show, that the method is working as 
expected. It is obvious, that it always terminates, when only two measurements 
remain, at the latest. 

7.2 Finding the asymptotic straight Une 

Sometimes we need to find, based on experimental information, the equation 
of a line, which is asymptotic or tangent to the investigated curve. The inter- 
val methods, described in this paper, may be of some help in eliminating the 
subjectivity of the person processing this sort of experimental data. Nowhere 
in the literature I was able to find a prescription on how to deal with problems 
of this kind. Yet they are important and so is the reliable determination of 
their relevant parameters, including the uncertainties. Consider, for example, 
the process described by formula: 



where the number of various subprocesscs, k, need not to be known precisely in 
advance and the unknown relaxation times r's are well separated and ordered 
in increasing order. Recording such a process, especially near t = 0, we obtain 



that is, the equation of a straight line, providing that the entire process is ini- 
tially dominated by subprocess with shortest relaxation time. This is an equa- 
tion commonly considered when studying (photo) chemical reaction's kinetics, 
radioactive decay, multilevel relaxation and many other processes. Similar ex- 
pressions can be obtained for the initial permeability of ferromagnetic materials 
or susceptibility of paramagnetics. In all such cases the interesting physical pa- 
rameters are hidden in the slope of the corresponding straight line being tangent 
to the experimental curve. How can we obtain the reliable values of parameters 
for such a line? 

Suppose, that we have found the interval hull of united solutions describing first 
n experimental points (a„, b„). Observe now, what should happen when we 
enrich the data set with one more measurement and try to find united solutions 
again (and the experimental points follow the straight line!). It is obvious, 
that (a„+i, b„+i) C (a„, b„), if only the measurements are correct. This 
simple observation is a basis for the proposed method. Starting with two first 
measurements (in this case the united solutions always exist), keep finding the 
united solutions, enlarging the set of measurements by one at every next trial. 
Finish when either (a„+i, b„+i) 2 (^n, b„) or (a„+i, b„+i) = 0, whatever 




(32) 



y{t) « 2/(0) - 



(33) 
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happens first. The solution is then (a„, b„). Of course, the data should be 
properly ordered first, either as increasing or decreasing sequence of x's. 



8 Summary 

I have demonstrated how the interval analysis can be helpful in objective and 
reliable processing of experimental data. I have discussed various types of possi- 
ble solutions of linear systems, which may be useful and desired under different 
circumstances. The methods presented here easily handle classical cases, when 
the uncertainties affect only the dependent variable, the values of independent 
variable being exact, as well as those with uncertainties in both variables. There 
is no need for, always more or less arbitrary, weighting the experimental data. 
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